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LIMIT LAWS FOR RANDOM VECTORS WITH 
AN EXTREME COMPONENT 

By Janet E. Heffernan and Sidney I. Resnick 1 

Lancaster University and Cornell University 

Models based on assumptions of multivariate regular variation 
and hidden regular variation provide ways to describe a broad range 
of extremal dependence structures when marginal distributions are 
heavy tailed. Multivariate regular variation provides a rich descrip- 
tion of extremal dependence in the case of asymptotic dependence, 
but fails to distinguish between exact independence and asymptotic 
independence. Hidden regular variation addresses this problem by re- 
quiring components of the random vector to be simultaneously large 
but on a smaller scale than the scale for the marginal distributions. In 
doing so, hidden regular variation typically restricts attention to that 
part of the probability space where all variables are simultaneously 
large. However, since under asymptotic independence the largest val- 
ues do not occur in the same observation, the region where variables 
are simultaneously large may not be of primary interest. A different 
philosophy was offered in the paper of Heffernan and Tawn [J. R. Stat. 
Soc. Ser. B Stat. Methodol. 66 (2004) 497-546] which allows exami- 
nation of distributional tails other than the joint tail. This approach 
used an asymptotic argument which conditions on one component of 
the random vector and finds the limiting conditional distribution of 
the remaining components as the conditioning variable becomes large. 
In this paper, we provide a thorough mathematical examination of 
the limiting arguments building on the orientation of Heffernan and 
Tawn [J. R. Stat. Soc. Ser. B Stat. Methodol. 66 (2004) 497-546]. 
We examine the conditions required for the assumptions made by 
the conditioning approach to hold, and highlight simililarities and 
differences between the new and established methods. 

1. Introduction. Extreme value theory motivates statistical models for 
the tails of multivariate probability distributions. All such theory relies on 
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some form of asymptotic argument; it is this limiting argument which forces 
us into the distributional tails and allows the examination of the extremal 
behavior of random vectors. 

The first such arguments relied upon limiting behavior imposed by con- 
sidering componentwise maxima of random vectors [15, 19, 30, 34]. This 
approach was extended by Coles and Tawn [5, 6], de Haan and de Ronde 
[16] in a multivariate analogue of the one-dimensional threshold methods 
of Davison and Smith [39], Smith [8]. The methods provide a rich class 
of models to describe asymptotic dependence but cannot distinguish be- 
tween asymptotic independence and exact independence. In response to this 
weakness, theory and models offering a richer description of asymptotic in- 
dependence behavior have been developed by Heffernan and Resnick [20], 
Ledford and Tawn [24, 25, 26], Maulik and Resnick [27] and Resnick [36]. 
The assumptions underlying this broader class of models have been termed 
hidden regular variation which elaborates the concept of the coefficient of 
tail dependence. 

Models based on assumptions of multivariate regular variation and hidden 
regular variation have a common reliance on limiting procedures in which all 
vector components are scaled by functions increasing to infinity. In the case 
of asymptotic dependence, reliance only on multivariate regular variation is 
sufficient since in this case the largest values of the components of the ran- 
dom vector tend to occur together. However, models based on multivariate 
regular variation fail to distinguish between asymptotic independence and 
exact independence and as such provide an inadequate description of depen- 
dence within the asymptotic independence class. Hidden regular variation 
attempts to repair this defect by allowing a different scale function which 
gives nontrivial limit behavior when vector components are simultaneously 
large. Although the hidden regular variation as typically formulated provides 
a more satisfactory description of the joint tail of the distribution for asymp- 
totically independent variables, this approach still has practical limitations 
in applications where interest is in tail regions other than the joint tail. 
These other tail regions are of practical significance since under asymptotic 
independence, the largest values of the components of the random vector 
tend not to occur in the same observation. 

The philosophy of examining distributional tails in which one or more 
but not necessarily all of the vector components are simultaneously large 
was explained in [21]. They focused on a single variable being large by con- 
ditioning on one component of the random vector and finding the limiting 
conditional distribution of the remaining components as the conditioning 
variable becomes large. Simulation studies in [21] suggested that this alter- 
native approach is useful in accurately describing a range of qualitatively 
different dependence structures including asymptotic dependence, asymp- 
totic independence and negative dependence. The approach is flexible and 
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readily applicable to general d-dimensional distributions. However, this new 
basis for modeling multivariate extremes was criticized in the discussion to 
the paper as lacking a rigorous theoretical underpinning. The discussion 
highlighted the need for further work to clarify how the approach extends 
and/or differs from established methodologies which rely on multivariate 
regular variation and hidden regular variation. 

In this paper, we use the philosophy of Heffernan and Tawn [21] and offer 
a mathematical framework for a theory of conditional distributions given 
a component is large. We have changed the formulation of Heffernan and 
Tawn [21] for two reasons. First, it is difficult to construct an asymptotic the- 
ory based on regular conditional distributions which are readily manageable 
only for the case in which smooth densities are assumed and secondly our 
formulation readily allows for connections to classical multivariate extreme 
value theory and regular variation. 

1.1. Content of the paper. Here are more details about the content of 
the paper. We consider the distribution of a bivariate random vector (X, Y) 
on ]R 2 under the condition that Y is large. Generalizations could be made 
to the case of a (d + l)-dimensional vector 

(X,Y) :=(X^\...,X^,Y) 

where we seek conditional limits of X given Y is large. However, we leave 
such generalizations to subsequent investigations. We assume the distribu- 
tion function F of Y is in a domain of attraction of an extreme value 
distribution Gr 7 (x), written F G D(Gy). This means there exist functions 
a(t) > 0, b(t) G R, such that, 

(1) F t( a ( t)y + m) ^ G7 ( y) (t^oo), 
weakly, where 

(2) G 7 ( ?/ ) =eX p{_(i + 7 y)-i/7} i 1+ T p0, 7 el, 

and the expression on the right is interpreted as e~ e v if 7 = 0. See, for 
example, [7, 9, 12, 31, 34]. We can and do assume 

where for a nondecreasing function U we define the left continuous inverse 

U^(t)=mf{y:U(y)>t}. 
Setting F = 1 — F, we have relation (1) is equivalent to 

(3) tF(a(t)y + 6(t))^(l + 7 y)- 1 ^, l + 7J/>0, 
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or taking inverses 

(4) ~ x>0 . 

a(t) 7 

For convenience we write E 7 := {y E R : 1 + jy > 0}. When considering 
vague convergence, it is convenient to close the interval {y E R : 1 + 7y > 0} 
on the right and denote by E 7 this closure. So, for instance, E = (—00,00]. 

In Section 2, we explore the implications of assuming the existence of: 

1. Scaling function a(-) > 0, and centering function &(■) E R so that (1) holds 
for F(x)=P[Y <x]; 

2. Scaling function a(-) > 0, and centering function /?(•) E R and a nonnull 
Radon measure \i on Borel subsets of [—00,00] x (—00,00], such that for 
each fixed y E E 7 , 

(a) fi([— 00, x] x (y, 00]) is not a degenerate distribution function in x, 

(b) m([-oo, s] x (y, 00]) < 00, 

(c) and 



(5) tP 



X-0(t) Y-b{t) ' 



fj,([-oo,x] x (y,oo]), 



(6) P 



< x 



Y>t 



n([-oo,x] x (0,oo]), 



a(t) ' a(i) 

at continuity points (x, y) of the limit. 

If we interpret (5) as vague convergence (cf. Section A. 3) in M + {[— 00, 00] x 
E 7 ), the Radon measures on [—00,00] x E 7 , then in fact (5) implies F E 
-D(G 7 ) for some 7 E R. Also, we will see that (5) is equivalent to assuming the 
existence of the conditional limiting distribution of the scaled and centered 
X variable given Y is extreme: 

~ X-l3ob*-{ t) 
aob^(t) 

as t converges to the right end point of F. This observation motivates our 
focusing on the convergence (5). 

Thus we make a different assumption from that of Heffernan and Tawn 
[21], in that in (6) we condition on the event Y > t rather than Y = t as 
in [21] which requires regular conditional distributions which are only defined 
up to almost everywhere equivalence. Our formulation also has a natural 
connection with extreme value theory as it implies Y is in a domain of 
attraction. In cases where densities exist, the two formulations are similar. 
See Section 2.5. 

Having established conditions for the existence of a limit in (5), in Sec- 
tion 3 we characterize the class of attainable limiting measures. These mea- 
sures are found to be either product measures or to have a spectral form 
after a standardization procedure and then transformation to polar coordi- 
nates. The standardization renders (5) into a standard multivariate regular 
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variation condition on the cone [0, oo] x (0,oo] and puts us in familiar ter- 
ritory. Relating (5) to standard multivariate regular variation allows us to 
identify the class of possible limit measures [32, 34, 37]. 

Section 4 is motivated by the Heffernan and Tawn [21] approach. Instead 
of normalizing X by deterministic functions of the threshold t, we normalize 
by functions of the precise value of Y occurring with X. This leads to a 
product limit form in all cases. 

In Section 5, we highlight connections between assumption (5) and stan- 
dard assumptions of multivariate regular variation and hidden regular vari- 
ation, and in particular show that under multivariate regular variation, (5) 
assumes something additional beyond multivariate regular variation only in 
the presence of asymptotic independence. 

Section 6 illustrates our results with a range of examples. Of particular 
interest is the bivariate Normal example which shows a transformation of 
X for which the limit (5) does not exist. This leads to Section 7, in which 
we explore how flexible one can be in the choice of measurement units in 
which to record X such that the limit measure in (5) does exist. Our results 
suggest how to construct change of variable functions which will give such 
a limit. 

Section 8 returns in more detail to the modeling assumptions made by 
Heffernan and Tawn [21] which motivated the work of this paper, and dis- 
cusses the implications of the new results for their conditional approach to 
modeling multivariate extreme values. 

1.2. Symbol and concept glossary. The Appendix contains several ap- 
pendices reviewing and referencing needed background. We merely list here 
some concepts and symbols; explanations and references in the appendices 
can be consulted as needed. 

vectors Bold lower case is reserved for deterministic vectors and bold up- 



r 



E 

Af+(E) 



n 



G 



RV P 



V 



'7 




G 
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E 7 
E 7 

D(G 1 



{x: 1 + jx > 0}. 

The closure on the right of the interval E 7 . 
The domain of attraction of the extreme value distribution G 
This is the set of F's satisfying (1). Note for 7 > 0, F G £>(G 7 
equivalent to 1 — F G RVu^. 



7- 
is 



2. Basic results. In this section we give some implications of (5) and the 
assumptions (1), (2) given in Section 1. 



2.1. Standardization of Y . Without loss of generality, we may assume 
Y is heavy tailed and F G D(Gi). The usual standardization procedure in 
extreme value theory (e.g., [34], Chapter 5, [17], Chapter 6.1.2, [32], Sec- 
tion 6.5.6) means that (1) implies for x > 0, as t — > 00, 



tP 



— y —- > X 


= tP 


t 





Y 



1 + 7 



a(t) 



b(t) ^ b(tx 



b(t) 



1) 



7 



a(t) 
1/7 



Note if the distribution F of Y is continuous, b*~(Y) has a Pareto distribu- 
tion and, in any case, b^(Y) will always have a distribution tail which is 
asymptotically Pareto. For y > 0, (5) and (4) imply 

-X-P(t) <x b^(Y) 



tP 



(7) 



a(t) 



tP 



t 



X - (3{t) 
a(t) 



< x, ■ 



>y 

Y - 



a(t) 



b(t) > b(ty)-b(t) 



a(t) 



-00, x\ x 



7 



00 



n([-oo,x] x (logy, 00]), 



if 7 + 0, 
if 7 = 0. 



So at the expense of replacing Y by b*~(Y), theoretical development proceeds 
without loss of generality by replacing the conditions around (5) with 



(8) 



' //([— 00, x] x (y, 00]) is not a degenerate distribution function in x, 
for each y > 0, 
P[Y<t]eD(Gi), lim tP[Y > t] = 1, 

t— >oo 

, X-P(t) Y 1 
i^P t--^ <x,— >y 



/u([-oo,sc] x (y,oo]), 



a(t) t 
x G R,y > 0, at continuity points (x,y) of the limit. 



We refer to (8) as the basic convergence with the Invariable standardized. 
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Remark 1. The argument leading to (8) shows that we are free to 
change the marginal distribution of the Y-variable without disturbing the 
conditional convergence (6). We will see in Section 6, that this is not always 
possible for the X-variable. 

We reiterate the connection with conditional modeling when (8) is as- 
sumed. For x which are continuity points of H(x) := /x([— oo,x] x (l,oo]), 

-x-p(t) 



H t (a(t)x + p(t)) := P 



< x 



Y>t 



(9) 



a{t) 

P[(X-p(t))/a(t)<x,Y>t] 



tP 



P[Y > t] 
a(t) ~ t 



— > fj,([— oo, x] x (1, oo]) =: H(x). 

Interpreting (8) as vague convergence on M + ([— oo, oo] x (0, oo]), we obtain 
from marginal convergence that 

H(oo) = oo, oo] x (1, oo]) = 1. 

2.2. Properties of the functions a(-) and /?(•). The following is an initial 
attempt to understand the properties of the functions a(-) and /?(•)■ 

Proposition 1. Suppose (X, Y) satisfy the standard form condition 
(8). Then there exist two functions ipi(-), ^(Oj such that for all c > 0, 



(10) 
and 
(11) 



lim 

t— >oo 



a{tc) 
hm — — - = -01(c) 
t->oo a{t) 

P(tc)-/3(t) 



a(t) 



The convergence in (10) and (11) is uniform on compact subsets o/(0,oo). 



Proof. Pick c > 0. For all but an at most countable set A of x-values, 
(x,l) and (x,c _1 ) are continuity points of fi. For x G A c , on the one hand 
we have (9) and on the other we have 



lim P 

t— >oo 



X - p(tc) 
a(tc) 



< x 



Y ->i 

t 



lim tP 

t— >oo 



a[tc) t 
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(12) 



lim -P 

t— >oo c 



^-/f)< I ,I >c - 

a(ic) tc 



p{[— oo, x] x (c , oo]) 



=:#( c )(x). 



Thus the family {H{\ converges with two different normalizations: 
H t (a{t)x + (3(t))^H(x), H t (a(tc)x + (3(tc)) ^ H {c \x). 

The convergence to types theorem (see, e.g., [10] or [35], page 275) implies 
that (10) and (11) hold and also 



(13) 



H®(x)=H(il> 1 (c)x + ifo{c)). 



To prove local uniform convergence in (10) and (11), replace c > in the 
argument with c(t) where c(t) — > c € (0,oo). Then (10) and (11) still hold 
and since ipi , ip2 are continuous (see next paragraph) , the result follows from 
continuous convergence. See [34], page 2, or [23]. □ 

From (10), we have that a(-) is regularly varying with some index p£l, 
written a £ RV p , so that ^i(x) = x p . (See [34], page 14, [4, 10, 11, 12, 38].) 
The function ^{x) may be identically zero. However, if it is not, then from 
[11], page 16, we have 



(14) 



k{x p -l)/p, iip^0,x>0, 
fclogx, if p = 0, x > 0, 

for k^0. Also, there is more detailed information: 

(i) If p > 0, then /?(•)£ RV P and (3(t) ~ -a(t). So it is enough to scale 
X in (8) with a consequent location change in the a>variable for p. 

(ii) If p = 0, then (3{-) G 11(a) and a E RVq. So a is the auxiliary function 
of the IT- function (5. 

(hi) If p < 0, then /3(oo) = lim^oo (5{t) exists finite and 



/3(oo) - f3(t) e RV, 



(/3(oo) -/?(*)) ~^a(*). 



Case (iii) can be reduced to case (i) by a change of variable. From case 
(hi) of (8) we get 



tP 
Write 
(15) 



X - (3(oo) + [(3(oo) - (3(t)\ K Y ' 
\ P \(f3(oc) - (3(t)) _ - X ' t >V 



p([-oo,x] x (y,oo]). 



X:- 



X-(3(oo) 



P(t) ■, 



1 



M(/3(oo)-/3(i)) 
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so that 



(16) 



tP 



X Y 
— — < x, — > y 



tp 



tp 



X-P(po) 1 Y 

>-.->y 



|p|(/3(oo)-/3(i))-z' t 



X-8(oo) 1 1 1 Y 

+—>-+— ->y 



\ P \(j3(oo)-p(t)) \ P \-x \ P y t 

x (y,°o]J =:p([-oo,x] x (y,oo}). 



1 1 

- + 1-1,00 
x \p\ 



Since case (iii) can be reduced to case (i), it does not need separate theoret- 
ical attention. 



2.3. Conditions for the limit p to be a product measure. It turns out that 
p being a product measure is equivalent to ip\ = 1 and ^2 = 0. 

Proposition 2. We have p = H x v\, where vi((y, 00]) = y~ x ,y > 
(i.e., p([-oo,x] x (y,oo\) = H(x)y~ l ), iff for all c > 0, 



(17) Vi(c) = tim 



a(tc) 



t^oo a(t) 



V> 2 c ) = lim ^ V ^ / V ; =0. 
t^oo a(r) 



Proof. Given that // is a product, we have from (9) and (12), that 
H(°\x) = H{x). Hence (17) follows from the convergence to types theorem. 
Conversely, if (17) holds, H^ c \x) — H (x) and from (12) we have, for all 
c> 0, p([— 00, x] x (c _1 , 00]) = cH{x). So for all y > 0, 00, x] x (y, 00]) = 
H(x)y- 1 . □ 

Remark 2. What if i/j 2 = but ipi^lt Then a G ii% for some p G R, 
p^0 and ^i(c) = c p , for c > 0. The reasoning in the previous proof shows 
that p has the form 



(18) 



p([-oo,x]x (y,oo]) = y 1 H(x/y p ), 



for and y > and where if is a proper nondegenerate probability 

distribution. 



2.4. When the X -variable can be standardized. Standardization is the 
process of transforming variables so that their distributions have regularly 
varying tails in standard form. See [34], Chapter 5, [17], Chapter 6.1.2, [32], 
Section 6.5.6. Once standard form regular variation is achieved, limit mea- 
sures have a scaling property and characterization of these limits becomes 
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possible. We know we can standardize the Y variable. What about the X 
variable? 

It is possible to standardize the X-variable if (3{t) > and V^O) in (11) is 
not constant and is nondecreasing on the range of X since in this case 
we have for x > 0, 



tP 



(19) 



v ; <x,->y 
t ~ t 



tP 



X - (3(t) < 0(tx) - P(t) Y > 
a{t) a(t) ' t 



n([-oo,ip 2 (x)] x (y,oo}), 



at continuity points of the limit. We emphasize there are important cases 
where ip2(x) is identically zero and thefore where X cannot be standardized 
by the procedure in (19); see Section 6.1. 

Standardization is also possible if ip2 = 0, provided X > and ipi ^ I; 
that is if a(-) E RV p with p / 0. If p > 0, then [4], Theorem 3.1.12a, c, page 
136, gives (3(t)/a(t) — > and by the convergence to types theorem (8) can 
be rewritten as 



tP 



X Y 

—ft < x i — > y 

a(t) ~ t 



■fi{[0,x] x (y,od\), x>0,y>0. 



Therefore, supposing without loss of generality that a(-) is strictly increasing 
and continuous (e.g., [38]), we have 



tP 



aT(X) ^ Y 

— - < x, — > y 

t ~ t 



tP 



X a(tx) Y 



_a(t) a(t) ' t 

^p((0,xf]x(y,oo]) 

and (ai*~ (X),Y) are the standardized variables. If p < 0, [4], Theorem 3.1.10a, c, 
page 134, implies /3(oo) := ]xmt- too @(t) exists finite and (P(oo) — f3(t))/ 
a(t) — > 0. Therefore, if we suppose P[X < /3(oo)] = 1, we have for x > 0, 



lim tP 

t— too 



l/a(t) ~ t 



lim tP 

t—*oo 



lim tP 

t—*oo 



lim tP 

t— »oo 



P(oo)-X ^ _j Y 
a(t) t 

P(qq) - x - Qg(oo) - p(t)) _! y 
n > ^ -> — >y 

a(t) - t 



x-/3(t) i y 

a t) ~ ' t y 



= /u([-oo,-x x ] x (y,oo]), 

and the variables ((/3(oo) — X)" 1 ^) can be standardized according to the 
recipe for the p > case. 
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2.4.1. When j3{t) is monotone. The standardization of the X variable 
in (19) begs the question of when (3 is monotone. Consider the case where 
ip2 ^ and ip2 is given by (14) and indexed by p G K. For discussing when 
(3{t) is monotone, it is important to remember that (3{-) is only determined 
up to the asymptotic equivalence given by the convergence to types theorem. 

Consider the following cases. 

1. p > 0: For this case, we have (3 £ RV p and there exists (3{t) € RV p such 
that /?(•) is continuous, strictly increasing to oo with ~ /?. (See, e.g., [38].) 
So without loss of generality, for the case p > 0, we may assume /?(•) is 
continuous and strictly increasing. 

2. p < 0: The transformation described in (15) and (16), show that the 
pair (X, Y) can be transformed to (X,Y) satisfying p > 0. 

3. p = 0: Suppose /?(•) G n + (a) after which we consider (3 6 Il_(a). From 
[18] as reviewed in Section A. 2, there exists [3{t) which is continuous, strictly 
increasing and such that (3 — (3 = o(a) so that the convergence of types 
theorem allows us to replace (3 by (3. Assume this is done which is tantamount 
to dropping the tilde. Then there are two cases to consider. 

(a) /3(oo) = oo. 

(b) /3(oo) <oo. 

For 3(a) it is clear that (3{t) has the desired properties of being continu- 
ous and strictly increasing to oo. For 3(b), proceed as follows to transform 
(X,Y): Define 

X= al \ - , p(t) 



/3(oo)-A' HK ' Moo) - f3(t) ' 
(20) 

a{t) = (/3(oo)-/3(t))2- 

Then (3{t) f oo is continuous and strictly monotone and (3 G 11+ (d) and after 
some calculation we get 



tP 



X - 0(t) Y 



a(t) ~ ' t 
--tP 



X-8(t) x Y 

< T~, 77^ „„, x TTTTTT ; T > V 



a(t) ~ 1 + a{t)x/{(3{oo) - (3{t)) ' t 
-^p([-oo,x] x (y,oo]) 

since (3 £ 11+ (a) implies f3(t)/a(t) — > oo which is identical to (/3(oo) — f3(t))/a(t) 
oo. Thus after the transformation of (X, Y) to (X,Y), case 3(b) is reduced 
to case 3(a). 
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What if p G H_(o)? Then define 

X = -X, j3(t) = -0(t), a(t) = a(t), 
and $ G n + (a) and this case reduces to the case when (3 G n + (a) since 



tP 



— — -t — < x, — > y 



tP 



X - pit) Y 

f±L > -x,— >y 

a(t) ~ t 



fj,([-x,oo] x (y,oo]). 



2.4.2. Summary. When -02 ^ 0, if we make the transformation X \— > 
X and consider the analogue of (8) for (X,Y), we can standardize the 
X-variable. If ip2 = 0, but ipi(c) = c p , for c > 0, p ^ 0, then for p > 0, 
(a*~(X),Y) are a standardized pair and for p < 0, ((l/a)*~(X), F) is a 
standardized pair. 

When the limit p, is a product measure, (ipiifa) = (1)0) and standard- 
ization is not possible; an example is given in Section 6.1.3 and a proof of 
the assertion is easy using the change of coordinate system techniques of 
Section 7. 



2.5. Densities. In this section we see what form the basic convergence 
takes when (X,Y) has a density. Since it is sufficient to suppose that the 
K-variable has been transformed to the standard case, for this section, we 
assume the following: 

1. The pair (X,Y) has density f(x,y). 

2. The marginal density fy{y) = f( x i y) dx of the Y-variable satisfies 

fY(y)=y~ 2 , y>l. 

Since we have densities, we assume the transformation to Y being standard 
renders Y a Pareto random variable with unit shape parameter. 

3. The joint density f(x,y) satisfies 

(21) t 2 a(t)f(a(t)x + (3(t),ty) -» g(x, y) G Li([-oo, 00] x (0, 00]), 

where the limit g(x,y) > is integrable, not identically zero and satisfies for 
each fixed v > 0, 

(22) v 2 g(u, v) is a probability density in u. 
Proposition 3. With the assumptions just listed, (8) holds with 

p,{[— 00, x] x (y, 00]) = / / g(u,v)dvdu, 

J u<x J v>y 

and H(oo) = ^([—00,00] x (l,oo]) = 1. 
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Proof. We use standard notation for conditional densities. So for in- 
stance, fx\Y=v( u \ v ) i s the conditional density of X given Y = v. 
We need two facts: 

1. First we evaluate the integrand. For v > 0, (21) implies 
(23) f(X-/3(t))/a(t)\Y/t=v(u\v) -> v 2 g(u,v) (t->oo). 
To see this, observe 

, / I \ f(x-(3(t))/a(t),Y/t(u,v) ta(t)f(a(t)u + (3(t),tv) 

f(X- m)/a{ t)\Y/t= V (u\v) = j—^ = JJ^ 

= t 2 a(t)v 2 f(a(t)u + /3(t),tv) -> ?; 2 c/(u, v). 

2. We now show convergence of the integral. The function of u 

f(X-fl(t))/a(t)\Y/t=v( u \v) 

is a probability density for fixed v. 
Now write 



tP 



X-P(t) Y 

— n — <x i — > y 
a(t) ~ t 



v>y] 



v>y] 



[u<x 



f(X-/3(t))/a(t)\Y/t=v ( u \ v ) du f Y /t(v) dv 



[u<x\ 



f(X-f3{t))/a(t)\Y/t=v (u\v) du 



v 2 dv. 



The integral inside the square bracket has an integrand which is a family of 
probability densities in the variable u (with v fixed) indexed by t which con- 
verges to a limiting probability density v 2 g(u,v). Hence by Scheffe's lemma 
(e.g., [35], page 253) 



.J \u<x 



f(X-l3(t))/a(t)\Y/t=v (u\v) du 



[u<x] 



v 2 g(u, v) du. 



Now the square bracket term is a conditional probability and hence is a 
function of v bounded almost surely by 1. So by dominated convergence, we 
have proven (8) as required. 

To check the last assertion that H(oo) = 1, note 



oo Jv>l 



g(u, v) dudv 



V>1 



v 2 g(u, v) du ) dv 



v dv = 1. 



V>1 



□ 
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Heffernan and Tawn [21] assume that (X,Y) have been transformed to 
have Gumbel marginal distributions, that is, P(X < t) = P(Y < t) = 
exp(— exp(— t)) for t E R and that for such (X, Y) 



(24) 



tP 



X - 0(t) 
a(t) 



< x 



Y = t 



converges to a nondegenerate limit distribution as t — ► oo, for some scaling 
function a(-) > and centering function /?(•) E R. 

Thus we see that since (23) implies [21] condition (24), (21) implies (24). 
This makes explicit the link between our assumptions (5) and those of 
Heffernan and Tawn [21] under the above conditions for densities. We have 



P 



X - (3{t) 
a{t) 



< x 



Y = ty 



u<x 



u<x 



f(X-P{t))/a(t)\Y/t=y( U \y) du 

y 2 g(u,y)du, 



and letting y = 1 gives 
~X- 



P(t) 



a(t) 



< x 



Y = t 



u<x 



g(u, 1) du. 



3. Characterizing the class of limit measures. Assuming the Y- variable 
is standardized, what is the class of limits in (8)? We divide this issue in 
two parts, depending on whether the limit measure [i is a product or not. 

3.1. The limit measure is a product. For this case, there is not much 
discussion required since for any distribution function H(x) on R, the limit 

[i = Hxb>i or oo, x] x (y, oo]) = H(x)y~ 1 

is possible. To achieve this limit, suppose X,Y are independent random 
variables with X having distribution H and Y being standard Pareto. Then 
with (3{t) = and a(t) = 1, (8) is satisfied. 

3.2. The limit measure is not a product. When [i is not a product, we 
change coordinate systems and transform X to some X* and assume (X*,Y) 
is a standard pair and 



(25) 



tP 



X* Y" 

T>7' 



A //*(•) in Af+([0,oo] x (0,oo])> 



where //* is a transformation of fi as described in Section 2.4. 

From (25), we see that the distribution of (X* ,Y) is standard regularly 
varying with limit measure //* (see [3, 32, 37]) on the cone [0, oo] x (0, oo] 
and, therefore /U* is homogeneous of order -1: 



H*(cA) = c V*(A) 



Vc>0, 



(26) 
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where A is a Borel subset of [0, oo] x (0, oo]. This means has a spectral 
form. We pick a norm. Any norm would do but for convenience define 

\\(x,y)\\ = \x\ + \y\, (x,y)eR 2 . 

Of course, when restricting attention to [0,oo] x (0, oo], the absolute value 
bars can be dropped. Then the standard argument using homogeneity ([34], 
Chapter 5), yields for r > and A a Borel subset of [0, 1), 

/i* \ (x, y) G [0, oo] x (0, oo] : x + y > r, - G A \ 
I x + y J 

/x* < r(x, y) G [0, oo] x (0, oo] : x + y > 1, - G A I 
I x + y J 

r™ 1 /!*^ (x,y) G [0, oo] x (0,oo]:x + y > 1, G A > 

I x + y J 

=:r^ 1 5(A). 

The Radon measure 5 need not be a finite measure on [0, 1) but to guarantee 
that 

(27) H*(x) = n*([0,x) x (l.oo]) 
is a probability measure, we need 

(28) [\l-w)S(dw)=l. 

Jo 

This will be clear from the following calculation to get the canonical form 
of H*(x) for x > 0: 

Using (26), write for x > 0, 

/i*([0,x] x (y,oo]) 

r~ 2 drS(dw) 

0<rw<x 
r(l — w) > y 
0<w<l 

(29) = I™ ( [ S(dw))r~ 2 dr 

Jr=0 \J 0<w<x/r J 
1 — j//r > w 
0<w<l 



oo 

s 



0, -A(l--)Al) )r' z dr 



S([0,xv A (1 - yv) A l))dv. 
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Integrating the double integral in reverse order yields the alternate ex- 
pression 

At*([0,z] x (y,oo]) 



(30) 



wG[0 



(f r- 2 dr)s{dw) 

,1) \Jy/(l-w)<r<x/w J 

((1 — w)y~ l — wx~ 1 ) + S(dw) 



\x* 




— < X 


Y>t 


t ~ 





W£[0,1) 

rx/(x+y) rx/(x+y) 

= y / (1 — w)S(dw) — x / wS(dw) 

Jo Jo 

Conclusion: The class of limits [i* or conditional limits 
HJx) = lim P 

t^oo 

is indexed by Radon measures S on [0,1) satisfying the integrability condi- 
tion (28). 

Example. As an example, suppose S is uniform on [0, 1): S(dw) = — , 

where c is chosen so that (28) is satisfied: Jq — dw= 1 which implies c = 1/2. 
This yields 

"2 fl + x/y" 



//* ([0,:c] x [y,oo]) 
we get ; 



x + yVy 

and setting y = 1 we get a Pareto distribution 

x 1 



1 + x 



1 



1+x' 



x + y 



x>0. 



4. Random norming. In [21], it was necessary to normalize X by a func- 
tion of the precise value of Y occurring with X to achieve nondegeneracy of 
the limiting conditional distribution. Motivated by this, we consider how to 
normalize the X-variable with a function of Y rather than a deterministic 
afline transformation, using functions of the threshold t in (6). This leads to 
a product form limit in all cases. 

It is significant that normalizing by using functions of the threshold t 
in (6) does not result in a product limit in all cases, but that the inclusion 
of the precise value of Y occurring with X adds enough detail to the nor- 
malization to allow the limit always to factorize. In statistical applications 
the factorization of the limit distribution will constitute a welcome simpli- 
fication of models based on this limiting form. Indeed, the statistical model 
of Heffernan and Tawn [21] relies on such factorization to ensure that the 
residuals formed by normalizing observed values of X by functions of the 
observed values of Y are independent of the Y values. 

We discuss this random normalization in two stages: 
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• The X-variable can be standardized and the limit in (8) is not a product. 

• The limit measure \i in (8) is a product measure. 

4.1. The X -variable can be standardized and the limit measure [i is not 
a product. We suppose X can be transformed to X* so that (X*,Y) is a 
standardized pair and (25) holds with limit measure /i*. As in Section 3.2, 
let 5 be the spectral measure of Then we have the following result which 
forms the basis of the estimation procedure proposed in [21]. 

Proposition 4. // (25) holds, then 
X* Y\ 1 

" G x ui in M+([0,oo] x (0,oo]), 



(31) tP 
where for x > 

(32) v\{(x, oo]) = x~ x and G(x) 
This means 



x/(l+x) 



p 



X* 

T 



< X 



Y>t 



G(x), 



(l-w)S(dw). 



x>0. 



Conversely, if (31) holds, then so does (25). 

Proof. This proof is discussed in Theorem 2.1 of [28]. The outline of 
the argument is as follows. Applying the map T\{x, y) = (|, y) to (25) yields 
after a compactification argument that 

IP 



X* Y" 



So the limit evaluated on [0,x] x (y, oo] is 



HA (u,v) : - < x,v > y 



I u 

y^^l (u,v): — <x,v>l 



rw/(r(l— w))<x 
r(l - w) > 1 



r 2 drS(dw) 



w<x/(l+x) \Jr>l/(l-w) 
x/(l+x) 

(l-w)S(dw). 





r 2 dr I S(dw) 



□ 



The converse proceeds similarly using the map T^x, y) = (xy, y) = T x 1 (x, y). 
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4.2. The limit measure fx is a product measure. Now we suppose (8) 
holds with n = H x v\. In this case, from Proposition 2, (10) and (11) hold 
with ipi(x) = 1, ^(x) = 0. 



Proposition 5. //, 
(33) tP 



X-I3{t) Y ' 

— — < x, — > y 

a(t) ~ t 



H(x)y~ 1 (xeR,y>0) 



for a nondegenerate probability distribution function H{x), then also 



(34) 
and 



tP 



X-P(Y) Y ' 

jTp—<x,->y 

a(Y) t 



H(x)y~ 1 (xeR,y>0) 



P 



X-f3(Y) 
a(Y) 



< x 



Y>t 



H(x). 



Conversely, if (34) holds and a(-) and /?(•) satisfy (10), (11) locally uni- 
formly with tp\{x) = 1, and ^(x) = 0, then (33) also holds. 



Proof. For any K > y > we have 
tP 



X -^ < x ^e(y K] 
a(Y) 



tP 



X - (3(t) < a(tY/t) x | f3(tY/t) - f3(t) Y c ' 
a(t) ~ a(t) a(t) ' t 



and because of local uniform convergence in (10) and (11), this converges to 
H([~oo,x] x(y,K]) = H(x)(y- 1 -K- 1 ). 

Therefore 



lim inf tP 

t— >oo 



X - f3(Y) Y 
a(Y) t 



> lim inf tP 

t— *oo 



X-(3(Y) Y . rjrl 



= H(x)(y- 1 - K- 1 ). 
Since this is true for all K > y, we have 

1 >H{x)y-\ 



lim inf tP 

t— >oo 



a(Y) t 



Also, 
lim sup tP 

t—*oo 



X - p(Y) Y 
a[Y) t 



< lim tP 



X-p(Y) Y . _ 



+ lim sup tP 



t— >oo 



> a 



i/(x)(y~ 1 -K~ 1 ) + A^ 1 . 
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Letting K — > oo provides the other half of the sandwich and (34) is proven. 
For the converse, write 



tP 



a(t) t 



tP 



x - 000 a(t) m - 0OQ Y 

a{Y) ~a{Y) X+ a(Y) ' t AJ 



Proceed as before using uniform convergence. □ 



5. Connection to multivariate extreme value theory and asymptotic in- 
dependence. We now make some comments on the relationship between 
our conditioned limit condition (8) and multivariate extreme value theory. 

Suppose the distribution of (X,Y) is in the domain of attraction of a 
multivariate extreme value distribution. This means that for i.i.d. replicates 
{(Xi,Yi),i > 1} of (X, Y) there exist centering bj(t) 6 K and scaling a,(t) > 
functions, j = 1,2, and 



(35) 



V£=i*i-6i(n) yUY-h^n) 

— / \ — y 



ai(n) 



a 2 (n) 



G(x,y), 



where G is a multivariate extreme value distribution. Let the marginal dis- 
tributions of G be Gj, j = 1,2. Asymptotic independence means G(x,y) = 
G l (x)G 2 (y). 
Define 



Ux{x) 
Xj(x) 



P[X > x] 
1 



logGj 

G*(x,y) = G(xi(x), X 2(y)) 



U 2 (y) 



1 



P[Y > y] ' 
x>0,i = l,2, 
x > 0,y > 0. 



According to Resnick [34], Proposition 5.10, page 265, we can standardize 
the condition (35) by transforming (X,Y) ^(X*,Y*) = (U 1 (X),U 2 (Y)) and 
then 



(36) 



P 



- LL - - — - < x, < y 



n 



G*(x,y), 



and G* is max-stable. From [34], Proposition 5.15, page 277 and [32], Sec- 
tion 6.1, this is equivalent to marginal convergence and multivariate regular 
variation of the distribution of (X*,Y*): 



(37) 



tP 



T'T 



(•), 
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in M+([0, oo] 2 \ {0}). Here v* is a Radon measure on [0, oo] 2 \ {0} satisfying 

(38) v*(t-)=t~ 1 »*(■). 

Asymptotic independence means 

v* ( [0, x] x [0, y] ) c = - log G*(x,y) = - log G*(x,oo) — log G* (oo, y) 

= v*({x, oo] x [0, oo]) + u*([0, oo] x (y, oo]), 

and v* concentrates on the lines {(x, 0) : x > 0} U {(0, y) : y > 0}. 

Suppose the domain of attraction condition (37) holds but asymptotic 
independence does not hold. Condition (37) implies for x > 0,y > 0, 



tP 



— < x, — > y 

t - t y 



v*([0,x] x (y,oo]) 



and we claim for fixed y > 0, z/*([0,x] x (y, oo]) is not degenerate in x. This 
follows, for instance, from (38). Conclusion: the domain of attraction con- 
dition (37) in standard form without asymptotic independence implies that 
(X*,Y*) satisfy (8). Condition (8) is equivalent to vague convergence on the 
cone [0, oo] x (0, oo] while the regular variation condition (37) gives vague 
convergence on the bigger cone [0, oo] 2 \ {0}. 

Suppose (37) holds with asymptotic independence. Consider (8) with X* jt 
in place of {X — (3(t))/a(t). The nondegeneracy condition in (8) fails because 
for fixed y > 0, /x([— oo,x] x (y, oo]) = u*([—oo,x] x (y, oo]) concentrates all 
mass at x = 0. If one wants (8) to hold, one must make an additional assump- 
tion beyond the domain of attraction condition (37) and the X* variable in 
(37) must be normalized differently. For a simple particular case which is 
somewhat familiar, consider the following: Suppose we assume the condition 
(37) with asymptotic independence and in addition we assume that X* can 
be normalized by a(t) instead of by t, so that (8) holds in the form 



(39) 



tP 



—r<x,— >y 
a(t) t 



■fi{[0,x] x (y,oo]), x>0,y>0. 



From (39) and (37), we have for < a < b < oo and y > 



tP G (a,b],— >y 

a(t) t 

5 —£(a,b],— >y 



fj,{(a,b] x (y,oo]), 



0. 



We claim that t/a(t) — > oo so that a(-) is of smaller order than t. If not, 
there exist t n — > oo and < c < oo and t n /a(t n ) — ► c. From the nondegener- 
acy condition in (8), we may pick < a < b such that //((a, b] x (1, oo]) > 0. 
Then 



< fi((a, b] x (l,oo]) 



lim t n P 

n— >oo 



X* 



a (i n ) a(i n ) ' 
a, o 



tn 



tn 



tn 



> 1 
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giving a contradiction. So a(-) is of smaller order than t and we have the 
situation of hidden regular variation [20, 27, 36]; that is, the regular variation 
condition (37) holds on the big cone [0,oo] 2 \ {0} but a different regular 
variation condition holds on the smaller cone [0, oo] x (0, oo]. 

To summarize: The multivariate extreme value paradigm without asymp- 
totic independence subsumes our conditioned limit condition (5). However, 
in the presence of asymptotic independence, the multivariate extreme value 
condition is refined by (5) which uses a more delicate normalization to track 
mass into the part of the distributional tail where the conditioning variable 
Y is large. 



6. Examples. We give examples to illustrate some intricacies. 

6.1. Bivariate normal. Suppose Ni,N 2 are i.i.d. N(0, 1) random vari- 
ables and |p| < 1. Define {X,Y) = (i/l - p 2 iV"i + pN 2 ,N 2 ) which is a bi- 
variate normal vector with means 0, variances 1 and correlation p. Denote 
the standard normal distribution function by N(x). Recall (e.g., from [34], 
page 71) that we may set 
1 



a(t) 



(40) 



b{t) 



v / 2ioifr 



1 



l-N 
and then for i£l 



~ (+ s r^r—. (l/2)(loglogt + log47r) 
(t) = V21ogt — \-o(a{t)), 



v / 2bgt 



lim tP 

t— >oo 



Ni - bit) 
a(t) 



> x 



6.1.1. Conditional limits for (X,Y). We begin by discussing the follow- 
ing result learned from [1]. Suppose N(x) is the standard normal distribution 
function and n(y) is its density. Then 



(41) 



tP 



X - pb(t) < x, ; > y 



a(t) 



or standardizing the y-variable 
(42) tP 



X - pb(t) < X, b -^-t > y 



N(x/Jl-p*)e-y, 



Here we claimed f3(t) = pb(t) and a(t) = 1. It is well known (e.g., [34], 
page 71) that b(-) £ TL(a(-)) and therefore 

(3{tc) - P(t) 



(43) 



a(t) 



p(b(tc) - b(t)) 

(b(tc)-b(t)) 
9 a(t) 



a(t) ~ plogc • a(t) — > 0. 



22 



J. E. HEFFERNAN AND S. I. RESNICK 



Thus ip2{x) in (11) is identically and ijji(x) = 1. 
We now see why (41) and (42) are true. We write, 



tP 



X - pb(t) < x, — - 7I f^ > y 



tP 



a(t) 

l^N, + pN 2 - pb(t) < x, N2 ~ b{t) > y 

a{t) 



f 

■J a 



P[Jl - p 2 Ni +ps- pb{t) < x]tn(s) ds 

(t)y+b(t) V 
oo i 

P[yj\ - p 2 N 1 + p{a(t)u + b(t)) - pb{t) < x) 



x ta(t)n(a(t)u + b(t)) du 

~ f°° P[J 1 - p 2 iVi < x - pa{t)u]e~ u du 
Jy 

since ta(t)n(a(t)u + b(t)) — > e~ u . Using the fact that a(t) — > 0, we get con- 
vergence to 

-» f°° P[J 1 - p 2 iV! < x]e~ u du = N(x/J 1 - p 2 )e~ y , 

as claimed. 

Conclusion: The limit measure is a product measure, (ipi,i/)2) = (1)0) and 
a(t) = 1. We have an illustration of Proposition 2. 

6.1.2. Exponential marginals for X . In light of the standard form result 
(42) it is tempting to look at limits for (6 < ~(X), b*~(Y)) but this turns out not 
to work. The reason for this is explored in Section 6.1.3. Instead, following 
[21], we consider (\ogb*~(X),\ogb*~(Y)). Thus we can transform X to have 
exponential marginals but not Pareto marginals. 

We show the standard form 



tP 



(44) 



log&^pO - logb-(pb(t)) b^(Y) 

<x, : >y 



N 



pb(t) 

x 



f 



y 



The verification of (44) needs the following lemma. 

Lemma 1 . The function 

V(t):= - logiV(logt) = log6^(logt) E Il(logi) 
is H-varying with auxiliary function g{t) = logt. 
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Proof. To prove membership in the II-class, it suffices according to de 
Haan [14] (see alternatively [34], page 30), to show V'(t) £ RV-\ and then 
the auxiliary function can be taken to be tV'(t). So it suffices to show 

(-\ogNQogt))'~^eRV-i. 



t 



The derivative is 

n(logt)t" 1 ^(logi)*" 1 



iV(logt) n(logt)/logt 



t _1 iogtGi?y_i. 



□ 



To show (44), we use (42) and the Delta method. The left-hand side of 
(44) is 



tP 



y( e X-pb(t) e pb(t) J _ V ( e pb(i) ) & «- (Y) 



< x, 



>y 



Plloge^ 1 ^ 1 ^ < x]^ 1 = n(- 



y 



Here is the conditional form of (44), where X is transformed to have 
exponential marginals: 

■\ogb^{X)-\ogb^( P b{t)) 



lim P 

t— >oo 



lim P 

t— >oo 



< X 
pb(t) 

logfc-(X)-logb-(pt) 
pt 



Y > b(t) 



<x 



Y>t 



N 



vT 



p 



The conditional form of (42), where the marginal distribution is normal, has 
the same limit: 

lim P[X - pb(t) < x\Y > b(t)} = lim P[X - pt< x\Y > t] = n( . X 

t— >oo t— >oo V V 1 f 

This result seems natural when one observes that the normal distribution is 
in the domain of attraction of the Gumbel distribution. 

After transformation of X to exponential marginals, we have for (44) 

P(t) = -log N(pb(t)), a(t) = pb(t), 

and again ip2{t) = 0, since 

0{tc) - (3(t) log(N(pb(tc))/N(pb(t))) \og{n{pb{tc))/n{pb{t))) 



pb{t) 



pb(t) 



loe e (^/2)(6 2 (te)-fe 2 (i)) p 2 

— = £-(b(tc) - b(t)) 

pb(t) 2 y y ' K " pb(t) 



pb(t) 

(b(tc) + b(t)) 



~p(&(fc)-6(t))-0, 

using the same argument as in (43). (This provides another illustration of 
Proposition 2.) 
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6.1.3. Why X cannot be transformed to Pareto. It is noteworthy that 
one cannot transform X to have Pareto marginals and expect the analogue 
of (41) to hold. Here is the explanation which also relates to the discussion 
in Section 7. 

Suppose for some choice of centering and scaling a 2 (t) > 0, /^(i) 6 M we 
have 



(45) 



lim tP 

t— >oo 



b^(x)-p 2 (t) b^OO 

< x, : — > y 



a 2 {t) 



t 



exists and is nondegenerate in the sense of condition (iii) stated at the be- 
ginning of Section 2. This expression (45) equals 



(46) lim P 

t— >oo 



X - pb(t) < b(a 2 (t)x + ft(t)) - pb(t), h -^l > 



t 



and from (41) we would have for some nondecreasing limit ip{x), that as 
t — ► oo, 

(47) b(a 2 (t)x + f3 2 (t))- pb(t)^i;(x). 

Furthermore, the limit in (45) would have to be 



(48) 



N 



ip{x) 



Inverting (47), we would need 

b<~(y + pb(t))-f3 2 (t) 
a 2 (t) 

Changing variables leads to 

6-(logte))-/3 2 (6 < -(logt/p)) 
a 2 {b^ (log t/p)) 

If is not constant, then ([11], page 16) 



log: 



1 



l-N 



ip^{\ogx). 



log 



is either regularly varying with positive index or it is IT-varying. Neither of 
these possibilities is true. If ip*~ is constant, then the limit (48) fails the 
nondegeneracy assumptions. 

So assuming the nondegenerate limit exists in (45) leads to a contradic- 
tion. This illustrates the restrictions in our ability to standardize the X 
variable discussed in Section 2.4. 



6.2. Heavy tailed examples. In this section, we present examples of heavy 
tailed random variables possessing asymptotic independence. 
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6.2.1. Mixture of independent standard regularly varying random variables 
I: positive p. Suppose nonnegative random variables (U,V) have a joint 
distribution which is standard regularly varying; that is, there is a limit 
measure v on [0,oo] 2 \ {0} such that 



tP 



U V 

T'T 



in M+([0, oo] 2 \ {0}). For example, (U, V) could be max-stable ([34], Chap- 
ter 5), [17] with exponent v. Suppose (Ui,Vi),i = 1,2, are i.i.d. copies of 
(U,V). For 0<p< 1, define 

(49) (X,Y) = B(U U V?) + (1 - B){Ul V 2 ), 

where P[B = 0] = P[B = 1] = \, and B is independent of (U i} V$,i = 1, 2. 
Observe that for any x > 0, y > 



X Y 
— < x, — < y 
t ~ t ~ 



(50) 



2 



— > x or — > y 
t t y 



+ t -P 
2 



2 v 2 

— > x or — > y 



tpp! > tx] + o(l) + tp[V 2 > ty] + o(l) 



So (X, y) is standard regularly varying, in a domain of attraction of a multi- 
variate extreme value distribution, and possesses asymptotic independence. 
The asymptotic independence holds even if (U, V) has no asymptotic inde- 
pendence. 

Now observe that 

\X Y 

tP — < x, — > y 
tP ~ t 

= \p\Ui < tPx, V[ > ty] + l -P[XJl < iPx, V 2 > ty] 
= \p[Ui < tPx, V 1 > tV'yVP] + t P [U 2 < tx 1 /?, V 2 > ty] 

^0 + ^([0,x 1 / p ] x (y,oo])=:(i([0,x] x (y,oo]). 

If (U, V) possess asymptotic independence, then i^((0,oo] 2 ) = and the 
nondegeneracy assumption for fx stated in (8) fails since for fixed y > 0, the 
function of x given by i/([0,x 1 / p ] x (y, oo]) concentrates at x = 0. So for this 
example, (X,Y) is standard regularly varying, asymptotically independent 
and provided (U, V) does not possess asymptotic independence, we can refine 
the asymptotic independence to get the limit in (8). This gives an example 



(51) 
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of case (i) of (14) with p = p, (3{t) = (l/p)a(t) 
distribution can most simply be written as 



t p . The conditional limit 



lim P 

t— >oo 



\x 




— < X 


Y>t 


tP ~ 





^([0,x 1/p ]x(l,oo]). 



(Note that the normalization of the X variable may have to be properly 
scaled by ct p for some c > to ensure the limit is a probability distribution.) 

The details of this construction can be repeated in modestly greater gen- 
erality with (49) modified as 



(52) 



(X, Y) = B(Ui,h(Vi)) + (1 - B)(h(U 2 ),V 2 ) 



with h £ RV P and h(t)/t — ► 0. As before, (X, Y) is standard regularly varying 
and asymptotically independent and 



(53) 



tP 



( X Y 

\Mt) , J 



where p is given as in (51). The condition h(t)/t — > is necessary and suffi- 
cient for (X,Y) to be asymptotically independent as can be seen by exam- 
ining the calculations leading to (50). 



6.2.2. Mixture of independent standard regularly varying random variables 
II; negative p. To exemplify case (iii) of (14) where p < 0, suppose (52), 
(53) still hold, h(t)/t— > and (U,V) are not asymptotically independent. 
Define X = 1/X,h = 1/h £ RV- P , and a measure fi on [0, oo] x (0, oo] by 



Then 



A([0,x] x (y,co])=p 
V X Y 

Xhitj'T 



i 

— , oo 

X 



x (y,oo 



tp 



in M_|_([0,oo] x (0, oo]). The reason this works is that the first space in the 
product [0, oo] x (0, oo] is compact: 



tP 



X Y 
- — < x, — > y 
h(t)- 't 



tP 



X 1 Y 
>-.— >y 



h(t) ~ x t 



, oo 



X (y,oo] J. 



So using (X,Y), we have an example of case (iii) of (14) where p = —p < 0, 
a(t) = f3(t) = h(t). The conditioned limit distribution is 



H(x) = lim P[X/h(t) < x\Y > t] = p 

t— *oo 



1 

— , OO 

X 



x (l,oo' 
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6.2.3. Mixture of independent standard regularly varying random vari- 
ables III; p = 0. Finally, suppose (52) still holds but this time suppose 
h E 11(5) is nondecreasing and II-varying with auxiliary function g(t). [E.g., 
we could take h(t) =logt, g(t) = 1.] Then h(t)/t^0 as t — > 00 so (X,Y) is 
standard regularly varying as well as asymptotically independent. To verify 
this we need the fact that if £ is either U or V, then 



(54) 



tP 



HO 



> X 







To see this, let K be a large number and 

'HO 



tP 



[MO " 

> X 


= tp 


t 





> x,£ < tK 



(x > 0, t — > 00). 

HO 



+ tp 



>x,£>tK 



<o(l) + tP[i>tK]^K~ 1 . 



The upper bound is arbitrarily small and thus we verified (54). 

Now we check that (X,Y) is standard regularly varying and asymptoti- 
cally independent: 



tP 



X 



Y 



— > x or — > y 
t t y 



2 



Ui h(Vt) 

— > x or > y 

t t 



h(U 2 ) V 2 

> x or — > y 

t t 



t 



= o(l) + -P 

Note we applied (54). 
Next consider 



Ui 



> x 



♦3" 



>y 



1 



tp 



X - hit) Y 
— — < x, — > y 

g(t) ~ t y . 



o(l) + ^ 



h(U 2 )-h(t) v 2 ■ 

<x i — > y 

g(t) ~ t y . 



U 2 h^(g{t)x + h{t)) V 2 

— < , — > y 

t ~ t t 



2 

^i/([O l er B ]x(y,oo]). 



-P 

2 



U2 . x V 2 
— < e , — > y 
t ~ t y 



This exemplifies case (ii) of (14) with p = 0, (3(t) = h(t) and a(t) = g(t). The 
form of the conditioned limit is 



X-h(t) 



< x 



Y>t 



iK[0,e*]x (l,c»]) 



x G . 
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7. Change of coordinate system. How much freedom do we have to mea- 
sure the X-variable in different units? This issue was raised in the discussion 
to Heffernan and Tawn [21] and we try to offer further insight on the matter 
here. For the example in Section 6.1.3 we saw that for (X,Y) bivariate nor- 
mal, it was possible to transform X \— > log6^(X) and get a conditional limit 
but the transformation X i— > (X) did not preserve existence of conditional 
limits. Can something more general be said about this issue? 

Starting with (8) where the Invariable is standardized, for what mono- 
tone increasing functions h(-) do there exist centering and scaling functions 
&2{p) > 0, f3 2 (t) £ such that for some limit measure \x 2 satisfying the non- 
degeneracy assumptions at the beginning of Section 2 we have 



(55) tP 



( h(X) - fo(t) Y\ 
\ a 2 (t) ' t J 



e ■ 



V2 



in M_|_([— oo, oo] x (0,oo])? This problem has many similarities to ones con- 
sidered in [2, 33] and the experience gained in Section 6.1.3 is helpful. 

In (8), assume centering by (3(t) is really necessary; that is, suppose it is 
not the case that (3{t) = o(a(t)). [If (3(t) = o(a(t)), the following arguments 
are easier and lead to regular variation of h.] Assume (55) and rewrite the 
left side of (55) evaluated on [— oo,x] x (y, oo] as 



tP 



X-0(t) ^h^{a 2 {t)x + (3 2 {t))-(3{t) Y 

^ 77^ >T >V 



a(t) a(t) t 

Since this converges, there must exist a limit ip(x) such that 

(56) h^{a 2 (t)x + (3 2 (t))-m 

a(t) 

and then we see that 

(57) n([-oo,il)(x)} x (y,oo]) = /j, 2 ([-oo,x] x (y,oo\). 

The limit ip cannot be constant without violating the nondegeneracy as- 
sumption for fi 2 . Inverting (56) we get 

h(ya(t) + P(t)) - p 2 (t) ^ 
a 2 (t) 

This suggests we set 

(58) fo(t) = h(J3(t)), 
since 

wiHia , f (, ) - f (, )=:A ) 

a 2 (t) 
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and presuming %(1) > 0, we could set 

a 2 (t) = h(a(t) + (3(t)) - h((3(t)). 

We now look at some possible forms of h which allow change of coordi- 
nate system (55). We do not achieve necessary and sufficient conditions but 
come to an understanding of how to generate broad classes of functions h 
permitting nonlinear transformation of X. 

7.1. Case A: a(t) is asymptotically a constant. Assume [3{t) f oo as t — > 
oo. If a ~ 1, then 

h(y + (3(t))-h((3{t))_ 



a 2 (t) 

and changing variables yields 

h(y + t)-h{t) 
a 2 (/^(*)) 



x(y), 



or 

(6°) T^TTl — 7v\ ►Xtlogx), a; > 0. 

a 2 (/3^(logt)) 

Since /i o log is nondecreasing, either [11] 

(a) h o log G RV p ,p > 0, in which case az(^*~ Qogt)) ~ fr(logi) 
or 

(b) 7iologen(a 2 o/3 < -(logt)). 

Conclusion: If a ~ 1, we may change coordinates X \— > h(X), provided 
h o log G RV P U n(a 2 ° /?^(logf)). 

Remark 3. 1. In Section 6.1.3, a(t) = 1. We tried h(x) = V~(x) but did 
not get a conditioned limit law. In Section 6.1.3, Ziolog = olog is neither 
regularly varying, nor II- varying. 

2. In Section 6.1.2, a(t) = 1. We tried h(x) = log6^(x) which led to a condi- 
tioned limit law because Lemma 1 proved h o log = log o log G Il(log). 

3. The result in (b) suggests how to construct other examples of h which lead 
to conditioned limits. If g is any slowly varying function, then J* g(u)u~ 1 du 
is II- varying with auxiliary function g ([14], [34], page 30). Define h by 
/i(logx) = J* g(u)/udu or 

h'(x)=g(e x ), h(x)= I* g{e u )du. 

Jo 

Any such h will lead to a conditioned limit. Examples include: 
• g(x) = logx and h{x) = x 2 /2. 
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• g(x) = log log x and h(x) = Jq log udu~ x log x. 

• g(x) = (logx) p and h(x) = j^- for p > 0. 

For an example where h o log € -Rip for p > 0, set 

h(\og x) = U{x) G RV P or h(x) = U(e x ). 

Apply this to the convergence (42) for the bivariate normal pair (X,Y) 
where recall 



P(t)=pb(t), a(t) = l, n([-oo,x] x (y,cx>])=N(-^=^jy- 1 . 

Then evaluating ( 
/3^olog = U that 



Then evaluating (60) with h(logt) = U(t) € RV p ,p > 0, gives, with 02 



U(tx)-U(t) 

-^x p -l = x{logx). 



U(t) 

Therefore, x(y) = ePy ~ 1> an d from (57) 
U(e x )-U(eP b ^) b^(Y) 

S^i . > y 



tP 



C7(e^W) ~ ' t 



M([-°°,X*~(aO] x (y,oo]) 

V V 1 - p 2 / 



So for this example, &(t) = a 2 (t) = t/"(ef*(*)). 

7.2. Case B: a(t) zs not asymptotically a constant. Again assume /3(t) | 
00 as t — > 00. Transform (59) to get 

ct-i o /3^(t) 

which is of the form 

h(t + f(t)y)-h(t) 
a*(t) 

To proceed further in a way that generates a broad class examples, suppose 
f(t) = ao(3*~(t) is self-neglecting [4]. A simple sufficient condition is f'(t) — > 
and / self-neglecting means it is the auxiliary function of a T-varying 
function (see Appendix A. 2) and that 

7M d "} er(/) ' 

Then defining the function V by 

h = V o H or equivalently V = h o i?* - 
we have either ([14], page 249, [34], page 36) 
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(a) V ell and x{v) = log e y = y; 

or 

(b) V G RV p ,p > and = e py - 1. 

Conclusion: We considered the case that (3 ^ o(a) and f oo and 
a not asymptotically a constant. For such a case, the change of variable 
X i — > /i(-X") preserves conditioned limits provided /i is either the composition 
of a Il-varying function and a T-varying function or the composition of a 
regularly varying function and a T-varying function. (The composition of 
a regularly varying function and a T-varying function is another T-varying 
function; see [12], [34], page 36). 

8. Discussion and concluding remarks. The statistical models proposed 
by Heffernan and Tawn [21] are based on the assumption that for (X,Y) 
having Gumbel marginal distributions, there exist normalizing functions a(-) 
and /?(•) such that the conditional distribution of (X — (3(y))/a(y) given Y = 
y can be approximated for large y by some nondegenerate, proper G(x). We 
have built our theory by standardizing Y to have asymptotically Pareto dis- 
tribution and looked at the conditional distribution of (X — (3(t))/a(t) given 

Y > t which also leads to conditional distributions for (X — (3(Y))/a(Y) 
given Y > t. This formulation is consistent with the Heffernan and Tawn 
[21] approach and allows a mathematically precise theory which can be re- 
lated to the extended theory of multivariate regular variation. 

From the perspective of statistical modeling, important results are con- 
tained in Propositions 4 and 5. These propositions reveal the factorization 
of the limit distribution obtained when X is normalized by the value of 

Y that occurs with it. This factorization permits a significant simplifica- 
tion of models based on the limit form, as it enables the assumption of 
limiting independence between the conditioning and standardized variables. 
This independence assumption was employed in [21] and is key to statistical 
modeling and extrapolation. 

One issue we have not resolved is consistency of different models. The 
definition (5) or its standardized version (8) is not symmetric in the X, Y 
variables. However, when fitting models to data one has a choice of which 
variable to condition being large and a logical issue is whether the various 
models obtained by conditioning on different variables are related to each 
other in any way. Conditions for consistency would strengthen the statistical 
model assumptions based on this representation and therefore potentially 
improve the ability of such approaches to describe the joint distribution in 
tail regions where there is naturally little data. Currently we have nothing 
terribly useful to say on this issue other than to point out that it seems 
important to understand consistency better. 
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APPENDICES 

For convenience, this section collects some notation, needed background 
on regular variation and notions on vague convergence needed for some for- 
mulations and proofs. 

A.l. Vector notation. Vectors are denoted by bold letters, capitals for 
random vectors and lower case for nonrandom vectors. For example: x = 
(x^\ . . . ,1^) G Operations between vectors should be interpreted com- 
ponentwize so that for two vectors x and z 

x < z means < , i = 1, . . . , d, 

x < z means < , i = 1, . . . , d, 

x = z means = , i = l,...,d, 

zx=(zWiW zMjM), 

xVz = (iWv2W,...,iWv^) ) 



X 

z 



,(1) T (d) 



s(l) ' ' " ' z(«0 



and so on. Also define = (0, . . . , 0). For a real number c, denote as usual 
cx = (cx^\ . . . , cx( d )). We denote the rectangles (or the higher dimensional 
intervals) by 

[a,b] = {x£R d :a<x<b}. 

Higher dimensional rectangles with one or both endpoints open are defined 
analogously, for example, 

(a,b] = {x£t d :a<x<b}. 

A. 2. The function classes II and T. Continue the domain of attraction 
discussion: Writing (3) as 



l/ 7 



( l-F(a(t)x + ^)) )/^ (1 + 7X) 
and inverting yields as t — > oo 

"W Uog», if 7 = 0- 

In case 7 = 0, (62) says that ?>(•) G II(a(-)); that is, the function &(■) is II- 
varying with auxiliary function a(-) ([34], pages 26ff, [4, 11, 12]). 

More generally ([4], Chapter 3, [18]) define for an auxiliary function a(t) > 
0, 11+ (a) to be the set of all functions ir : R+ ^ R+ such that 

(63) lim — — = klogx, x>0,k>0. 

t-^oo a(t) 
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The class II_ (a) is defined similarly except that k < and 

n(a) = n + (a)un_(fl). 

By adjusting the auxiliary function in the denominator, it is always possible 
to assume k = ±1. 

Two functions Hi G U±(a), i = 1, 2, are n(a)-equivalent if for some c£R 

hm^-^U. 
i^oo a(t) 

There is usually no loss of generality in assuming c = 0. 

The class of regularly varying functions with index p G R is denoted by 
RV P so that J7 : R+ i-» R+ satisfies C7 G i?Vp if 

(64) teW"^ X> °- 

The following are known facts about Il-varying functions. 

1. We have vr G 11+ (o) iff 1/tt G n_(a/vr 2 ). 

2. If 7r G II+(a), then ([4], page 159 or [18], page 1031) there exists a continu- 
ous and strictly increasing Il(a)-equivalent function ttq with tt — ttq = o(a). 

3. If vrGn + (a), then 

lim 7r(i) =: 7r(oo) 

t— »oo 

exists. If 7r(oo) = oo, then 7r G i?Vb and ir(t)/a(t) — ► oo. If 7r(oo) < cxd, 
then 7r(oo) — 7r(i) G II_(a) and 7r(oo) — 7r(t) G i?Vo and (7r(oo) — vr(i))/ 
a(t) — > oo. (Cf. [11], page 25.) Furthermore, 

7 r m G n + (a/(7r(c») - vr(t)) 2 ). 
7r(ooJ — 7r[t) 

In addition to the function class II we need de Haan's class T ([4, 11, 12, 
13, 34]). A function V :R+ i— ► R+ is a T-function with auxiliary function / 
[written V G r(/)] if, as t — > oo, 

V(t + xf(t)) 

V(t) ' 

For V nondecreasing, V G T(f) iff F^* G o V*~). 

A. 3. Vague convergence. For a nice space E, that is, a space which is 
locally compact with countable base (e.g., a finite dimensional Euclidean 
space) , denote M + (E) for the nonnegative Radon measures on Borel subsets 
of E. This space is metrized by the vague metric. The notion of vague conver- 
gence in this space is as follows: If \i n G M+(E) for n > 0, then fi n converge 
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vaguely to /xq (written /i n — > /^o) if for all bounded continuous functions / 
with compact support we have 



This concept allows us to write (3) as 



(65) 



tP 



a(t) 



(n — > oo). 



■m 7 (-), 



vaguely in M + ((— oo, oo]) where 

m 7 ((x, oo]) = (1 + 7X)" 1 / 7 . 
Standard references include [22, 29] and [34], Chapter 3. 
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